Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Update DataframeAbstract #547

Merged

Conversation

nautics889
Copy link
Contributor

@nautics889 nautics889 commented Sep 10, 2023

  1. Update parameters for to_dict method in DataframeAbstract. Add into parameter (defaults to dict) since that parameter is present in original pandas.core.frame.DataFrame.to_dict. Add as_series parameter (defaults to True) for polars DataFrame. Add engine type check: depending on self._engine type, the call is going to be proxied either to pandas.core.frame.DataFrame.to_dict or polars.DataFrame.to_dict().
  2. Add tests for calling to_dict(). Add a test for checking if the type casting to dictionary works. Add a test for checking if the parameters are passed well according to the ._engine type of SmartDataframe.
  3. Add docstrings. Enhance docstring coverage for proxy-method in DataframeAbstract class.

Summary by CodeRabbit


  • Refactor: Updated the df_type function in pandasai/helpers/df_info.py to return None for invalid or None dataframes, enhancing error handling.
  • New Feature: Introduced a private variable _engine in DataframeAbstract class in pandasai/smart_dataframe/abstract_df.py to handle different dataframe engines.
  • Documentation: Added docstrings to methods in DataframeAbstract class improving code readability and understanding.
  • Test: Added new test cases in tests/test_smartdataframe.py for testing the to_dict method of SmartDataframe with various parameters.

* (refactor): update behaviour of `to_dict()` method, make it possible
  to accept `into` and `as_series` parameters (the last one is for
  polars dataframes).
* (tests): add tests for casting the dataframe to a dictionary, add
  tests for passing parameters in the proxy-call to dataframe's
  `to_dict()` method.
* (docs): add docstrings for proxy-methods in `DataframeAbstract` class
@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2023

Codecov Report

Merging #547 (4cbd0e1) into feature/v1.2 (f5c4be0) will increase coverage by 0.03%.
The diff coverage is 87.50%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff                @@
##           feature/v1.2     #547      +/-   ##
================================================
+ Coverage         83.39%   83.42%   +0.03%     
================================================
  Files                52       52              
  Lines              2584     2589       +5     
================================================
+ Hits               2155     2160       +5     
  Misses              429      429              
Files Changed Coverage Δ
pandasai/smart_dataframe/abstract_df.py 54.38% <85.71%> (+2.09%) ⬆️
pandasai/helpers/df_info.py 86.66% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

* (chore): update return type for `df_type` function
@nautics889 nautics889 marked this pull request as ready for review September 10, 2023 14:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2023

Walkthrough

This pull request enhances the flexibility and readability of the pandasai package. It modifies the return type of df_type function, adds a new variable _engine to DataframeAbstract, improves method documentation, and introduces comprehensive tests for the to_dict method in SmartDataframe.

Changes

File Path Summary
.../helpers/df_info.py Modified the return type of df_type function from str to Union[str, None] for better flexibility.
.../smart_dataframe/abstract_df.py Added _engine variable, improved docstrings, modified to_dict method to include default values and engine check.
tests/test_smartdataframe.py Introduced new fixture smart_dataframe_mocked_df and test methods test_to_dict and test_to_dict_passing_parameters for thorough testing.

🐇💻

In the land of code where the shadows lie,

We weave the changes, under the sky.

From return types to private store,

Tests and docs, we give you more.

So celebrate this pull request's might,

For it brings clarity, like dawn's first light. 🌅


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between f5c4be0 and 4cbd0e1 commits.
Files selected for processing (3)
  • pandasai/helpers/df_info.py (1 hunks)
  • pandasai/smart_dataframe/abstract_df.py (3 hunks)
  • tests/test_smartdataframe.py (3 hunks)
Files skipped from review due to trivial changes (2)
  • pandasai/smart_dataframe/abstract_df.py
  • tests/test_smartdataframe.py
Additional comments (Suppressed): 1
pandasai/helpers/df_info.py (1)
  • 14-14: The function signature has been changed from returning str to Union[str, None]. Ensure that all calls to this function throughout the codebase can handle a None return value.

@gventuri gventuri merged commit c05733b into Sinaptik-AI:feature/v1.2 Sep 12, 2023
@gventuri
Copy link
Collaborator

Thanks a lot @nautics889 , great improvement (in particular the .to_dict! Merging

gventuri added a commit that referenced this pull request Sep 13, 2023
* feat: add SQL connectors

* feat: add .env configuration for mysql and postgres

* feat: a smart df can now accept a connector as a param

* fix: correct error prompt now handles multiple dataframes

* feat: cache the dataframes from third parties

* fix: truncate long text from the sample

* feat: extracting filters (#451) (#483)

* fix: setters were not overriding some properties in some cases

* fix: remove multiple df overwrites (#436)

* improved on _is_df_overwrite method to check all targets

* test: add test for multiple df overwrite

---------

Co-authored-by: Gabriele Venturi <[email protected]>

* fix: prevent to_* pandas methods from running

* Release v0.8.3

* fix: environment for executing code (#419)

* fix: environment for executing code

* (fix): set `matplot.pyplot` package under "plt" alias in
  `_get_environment()`

* feat: Try to import a package when NameError (#430)

* (refactor): split calling of `exec()` for code and handling of errors
  into separate methods;
* (feat): add trying to import the `exc.name` and add one to context
  when `exc` is an instance of `NameError`;
* (tests): add tests for new methods;

* fix: add logging for `NameError` handling (#430)

* (fix): add log a general exception raised when trying to fix
  `NameError`

* fix: compatibility with python 3.9 (#430)

* (fix): update `handle_error()` method, add parsing `exc.args[0]` for
  cases with NameError if there is no attribute "name" of object of the
  exception.

* fix: update for `handle_error()`(#430)

* (fix): add checking if the library to be imported is preset in the
  whitelist of third-packages;
* (fix): remove "plt" when forming environment for code to be run;
* (fix): add raising an exception that was caught during fixing code
  with the error correction framework;
* (tests): add test-case for retrying to fix code with the error
  correction framework unsuccessfully;
* (tests): fix issue with inappropriate instantiation of numpy's `array`
  object within several test methods;

* Release v0.8.4

* Release v1.0b1

* Release v1.0

* fix: convert polars to pandas df and back to polars after exec the code

* docs: update docs for v1

* Release v1.0.1

* fix: remove shortcuts from backwords compatibile version

* Release v1.0.2

* refactor: minor updates (#446)

* (refactor): inappropriate type hint for `_has_run` in base class for
  middlewares;
* (style): remove built-ins name shadowing in `__init__()` method of
  `Prompt` class;

* docs: update LLMs documentation code to match the last version (#445)

I have updated OpenAI, Starcoder, Falcon, and AzureOpenAI.

But I haven't edited GooglePalm because it still uses api_key in the last version not api_token

Co-authored-by: Gabriele Venturi <[email protected]>

* fix: do not override setters and getters with __getattr__

* Release v1.0.3

* fix: huggingface models (#454)

* Release v1.0.4

* fix: update code manager to convert prompt id to string (#462)

* test: increase test coverage (#460)

* Update LLMs documentation code to match the last version

I have updated OpenAI, Starcoder, Falcon, and AzureOpenAI.

But I haven't edited GooglePalm because it still uses api_key in the last version not api_token

* Create test_google_vertexai.py

Create test cases for Class Google Vertexai. I have tested initailization function with default model provided and custom one as well.
Then I have built test cases for validation of class when model provided and without model.

* Update test_smartdataframe.py

Created test cases for _load_df() and _import_from_file() functions

* test: remove verbose test

* test: unskip skipped test

* test: test multiple invalid file formats

* test: fix tests

---------

Co-authored-by: Gabriele Venturi <[email protected]>

* fix: recursive error on df last_result property (#468) (#469)

* fix: recursive error on df last_result property (#468)

* fix: change last_result type to str

* test: add tests

---------

Co-authored-by: Gabriele Venturi <[email protected]>

* Release v1.0.5

* fix(llms): AzureOpenAI call method signature (#476)

* fix: added last_error property reference in smart_datalake (#475)

* fix: temporarely disable multi-turn capacity, causing hallucinations

* Release v1.0.6

* fix: create required folders as the datalake is initialized

* Release v1.0.7

* fix: create required folders in the project folder

* Release v1.0.8

* chore: fix typing in the logger logs (#464)

* docs: add video explanation for SmartDataframe

* feat: extracting filters (#451)

* (feat): add `_extract_filters()`, `_extract_comparisons()`,
  `_tokenize_operand()` methods to `CodeManager`;
* (tests): add basic tests;

* test: extracting filters (#451)

* (tests): rename `test_extract_filters()` to
  `test_extract_filters_col_index()`, update asserting according to
  task requirements;
* (tests): add `test_extract_filters_col_index_multiple_df()` for
  multiple dfs;

* fix: update according to requirements (#451)

* (feat): add module `node_visitor.py`, add `AssignmentVisitor` in
  `node_visitors.py`;
* (fix): `_extract_filters()` now returns dictionary;
* (feat): add grouping by df number when forming;
* (tests): update naming in test cases;
* (tests): add test method with non default (not `df`) variable name;

* feat: extracting filters (#451)

* (feat): add `CallVisitor` to `node_visitors.py` module;
* (feat): add extracting comparisions in `filter()` method being called
  as an attribute of object named `pl` or `polars`;
* (tests): test methods for extracting polars-like filters;

* refactor: extracting filters (#451)

* (refactor): add handling exceptions in `_extract_filters()` method;

* docs: extracting filters (#451)

* (docs): add documentations for implemented functions;
* (style): type hinting;

* test: merge similar tests

---------

Co-authored-by: Gabriele Venturi <[email protected]>
Co-authored-by: Aymane Hachcham <[email protected]>
Co-authored-by: Omar Elsherif <[email protected]>
Co-authored-by: Hanchung Lee <[email protected]>
Co-authored-by: Sanchit Bhavsar <[email protected]>
Co-authored-by: Massimiliano Pronesti <[email protected]>

* docs: add example of usage with connectors

* feat: allow to save and load config for connectors

* feat: add dynamic filters to connectors

* feat: add yahoo finance connector

* fix: prevent sql injections

* refactor: improve sql connector

* test: test yahoo finance connector

* refactor: update `DataframeAbstract` (#547)

* refactor: `to_dict()` parameters

* (refactor): update behaviour of `to_dict()` method, make it possible
  to accept `into` and `as_series` parameters (the last one is for
  polars dataframes).
* (tests): add tests for casting the dataframe to a dictionary, add
  tests for passing parameters in the proxy-call to dataframe's
  `to_dict()` method.

* docs: docstrings for `DataframeAbstract`

* (docs): add docstrings for proxy-methods in `DataframeAbstract` class

* chore: type hint issue

* (chore): update return type for `df_type` function

* feat: improve prompt in the way it handles conversation and dataframes

---------

Co-authored-by: Ihor <[email protected]>
Co-authored-by: Aymane Hachcham <[email protected]>
Co-authored-by: Omar Elsherif <[email protected]>
Co-authored-by: Hanchung Lee <[email protected]>
Co-authored-by: Sanchit Bhavsar <[email protected]>
Co-authored-by: Massimiliano Pronesti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants